STAT 679: Problem Set #3
Q1. Glacial Lakes
The data at this link contain labels of glacial lakes the Hindu Kush Himalaya, created during an ecological survey in 2015 by the International Centre for Integrated Mountain Development.
Part (a)
a.1 How many lakes are in this dataset?
There are 3624 lakes in this data set.
a.2 What are the latitude / longitude coordinates of the largest lakes in each Sub-basin?
largest_lakes = group_by(lakes, Sub_Basin) %>%
filter(Area == max(Area)) %>%
select(GL_ID,Sub_Basin,Area,Latitude,Longitude)| GL_ID | Sub_Basin | Area | Latitude | Longitude | geometry |
|---|---|---|---|---|---|
| GL086304E28374N | Arun | 4.0192059 | 28.37403 | 86.30475 | POLYGON ((86.29541 28.35189… |
| GL085838E28322N | Sun Koshi | 5.4113912 | 28.32223 | 85.83813 | POLYGON ((85.84835 28.32543… |
| GL086925E27898N | Dudh Koshi | 1.3428310 | 27.89853 | 86.92510 | POLYGON ((86.9127 27.89883,… |
| GL087866E27869N | Tamor | 0.6961043 | 27.86953 | 87.86618 | POLYGON ((87.85986 27.86149… |
| GL086447E27946N | Tama Koshi | 1.7197430 | 27.94679 | 86.44713 | POLYGON ((86.43581 27.93609… |
| GL085717E28042N | Indrawati | 0.0269401 | 28.04209 | 85.71757 | POLYGON ((85.71581 28.04205… |
| GL086542E27713N | Likhu | 0.0903048 | 27.71321 | 86.54286 | POLYGON ((86.54486 27.71284… |
| GL082948E29196N | Bheri | 4.9235690 | 29.19634 | 82.94822 | POLYGON ((82.94605 29.21763… |
| GL082423E29384N | Tila | 0.4322915 | 29.38408 | 82.42375 | POLYGON ((82.42288 29.38101… |
| GL081554E29648N | Karnali | 0.1262245 | 29.64815 | 81.55488 | POLYGON ((81.55656 29.64652… |
| GL082414E29753N | Mugu | 0.4283468 | 29.75376 | 82.41434 | POLYGON ((82.41277 29.75636… |
| GL081526E29772N | West Seti | 0.5075936 | 29.77282 | 81.52618 | POLYGON ((81.52834 29.77348… |
| GL081577E29897N | Kawari | 0.2839747 | 29.89755 | 81.57738 | POLYGON ((81.57937 29.89932… |
| GL081780E30128N | Humla | 0.7565747 | 30.12892 | 81.78076 | POLYGON ((81.77613 30.13338… |
| GL080178E30564N | Kali | 0.2396768 | 30.56462 | 80.17877 | POLYGON ((80.18417 30.56559… |
| GL084116E28446N | Seti | 0.1024868 | 28.44681 | 84.11694 | POLYGON ((84.11637 28.44483… |
| GL085519E28467N | Trishuli | 0.4539165 | 28.46783 | 85.51938 | POLYGON ((85.51812 28.47, 8… |
| GL084628E28596N | Budhi Gandaki | 0.2674945 | 28.59618 | 84.62826 | POLYGON ((84.63338 28.59795… |
| GL083851E28690N | Marsyangdi | 3.3841276 | 28.69089 | 83.85184 | POLYGON ((83.84369 28.70758… |
| GL083701E29218N | Kali Gandaki | 0.4344556 | 29.21850 | 83.70156 | POLYGON ((83.7026 29.21713,… |
Part (b)
Plot the polygons associated with each of the lakes identified in part (a).
Hint: You may find it useful to split lakes across panels using
the tm_facets function. If you use this approach, make sure
to include a scale with tm_scale_bar(), so that it is clear
that lakes have different sizes.
tm_shape(largest_lakes) +
tm_polygons(col='Area', palette="Blues",legend.show=F) +
tm_facets(by='Sub_Basin', free.scales = F, ncol=5, scale.factor = 4) +
tm_scale_bar(width=0.5) +
tm_layout(
main.title = "Largest lakes in each Sub Basin", main.title.position = c('center','top'),
panel.label.bg.color="darkgoldenrod4", panel.label.fontface='bold', panel.label.color = page_bg_color,
bg.color = "darkseagreen", outer.bg.color = page_bg_color, main.title.fontfamily = "Candara"
)Part (c)
Visualize all lakes with latitude between 28.2 and 28.4 and with longitude between 85.8 and 86. Optionally, add a basemap associated with each lake.
lakes_subset = lakes %>%
filter((Latitude>=28.2 & Latitude<=28.4) & (Longitude>=85.8 & Longitude<=86.0))
basemap = cc_location(loc=c(85.9,28.3), buffer=1e4)
tm_shape(basemap) +
tm_rgb(alpha=0.9) +
tm_shape(lakes_subset) +
tm_polygons(col='deepskyblue3') +
tm_layout(bg.color=page_bg_color, inner.margins=c(0,0), frame=F, saturation=-1)Q2. Australian Pharmaceuticals II
The PBS data set contains the number of orders filed every month for different classes of pharmaceutical drugs, as tracked by the Australian Pharmaceutical Benefits Scheme. The code below takes the full PBS data set and filters down to the 10 most commonly prescribed pharmaceutical types. This problem will ask you to implement and compare two approaches to visualizing this data set.
pbs_full <- read_csv("../data/PBS_random.csv") %>%
mutate(Month = as.Date(Month))
top_atcs <- pbs_full %>%
group_by(ATC2_desc) %>%
summarise(total = sum(Scripts)) %>%
slice_max(total, n = 10) %>%
pull(ATC2_desc)
pbs <- pbs_full %>%
filter(ATC2_desc %in% top_atcs, Month > "2007-01-01")Part (a)
Implement a stacked area visualization of these data.
ggplot(pbs) +
geom_area(aes(Month,Scripts/1e6, col=ATC2_desc, fill = ATC2_desc), alpha=0.6) +
scale_x_date(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0), n.breaks=11) +
scale_fill_brewer(palette = "Paired") +
scale_colour_brewer(palette = "Paired") +
labs(
title = "Sales of 10 most commonly prescribed pharma drugs from 2007 till date",
y = "Orders sold (in millions)", fill = "CLASS OF DRUG", col = "CLASS OF DRUG"
)Part (b)
Implement an alluvial visualization of these data.
ggplot(pbs) +
geom_alluvium(aes(Month, Scripts/1e6, fill = ATC2_desc, col = ATC2_desc, alluvium = ATC2_desc), decreasing = FALSE, alpha = 0.6) +
scale_x_date(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0), n.breaks=11) +
scale_fill_brewer(palette = "Paired") +
scale_colour_brewer(palette = "Paired") +
labs(
title = "Sales of 10 most commonly prescribed pharma drugs from 2007 till date",
y = "Orders sold (in millions)", fill = "CLASS OF DRUG", col = "CLASS OF DRUG"
)Part (c)
Compare and contrast the strengths and weaknesses of these two visualization strategies. Which user queries are easier to answer using one approach vs. the other?